Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the random read behavior in StressWorkerBench #18000

Merged
merged 23 commits into from
Sep 22, 2023

Conversation

voddle
Copy link
Contributor

@voddle voddle commented Aug 16, 2023

What changes are proposed in this pull request?

Improve the randomness of StressWorkerBench random read test, now each thread throw dice every time it trying to do a random read.

Why are the changes needed?

In previous StressWorkerBench random read test each thread read from same offset and same length everytime, this cause low randomness.

Does this PR introduce any user facing changes?

no

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! I added some comments.

@voddle voddle force-pushed the StressWorkerBench-random-read branch from a35dda2 to 4cb7ac9 Compare August 30, 2023 02:50
Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@@ -415,6 +396,10 @@ private BenchThread(BenchContext context, int targetFileIndex, FileSystem fs) {
mResult.setParameters(mParameters);
mResult.setBaseParameters(mBaseParameters);
mIsRandomRead = mParameters.mIsRandom;
mRandom = new Random(mParameters.mRandomSeed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are using the same random seed for all threads, then wouldn't the offset and length sequence of all threads be the same as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your motivation now. The threads were reading from one single offset and with the same length throughout the benchmark, and this PR makes them read with different offsets and lengths. Makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, i used to worry the same seed problem as well, but i think as all threads read enough times it won't be matter.

private final Random mRandom;
private final int mRandomMax;
private final int mRandomMin;
private final int mFileSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using int for file sizes limits the file to be smaller that 2.1GB. Can you either add a check to the cli parser to disallow input of files larger than 2.1GB, or change it to long?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consequently, offset should be check in the same way as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all changed to long, and StressWorkerBench will make sure the MaxReadLength is within 2.1G when validating the parameters. since the InputStream.read() method only support read length in integer.

Comment on lines +394 to +395
private final long mRandomMax;
private final long mRandomMin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length should be int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these two value will evaluated with long type mFileSize later, i think long would be better

@voddle voddle force-pushed the StressWorkerBench-random-read branch from a621637 to d7c81f9 Compare September 22, 2023 06:45
Copy link
Contributor

@dbw9580 dbw9580 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiacheliu3 jiacheliu3 added the type-feature This issue is a feature request label Sep 22, 2023
@jiacheliu3
Copy link
Contributor

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit be6974c into Alluxio:main Sep 22, 2023
12 checks passed
* generate random number in range [min, max] (include both min and max).
*/
private long randomNumInRange(long min, long max) {
return ThreadLocalRandom.current().nextLong(min, max + 1) + min;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note we are using thread local random here, so does the --seed parameter still take effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we no longer need this param

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants